Dataset statistics
| Number of variables | 12 |
|---|---|
| Number of observations | 2257 |
| Missing cells | 0 |
| Missing cells (%) | 0.0% |
| Duplicate rows | 755 |
| Duplicate rows (%) | 33.5% |
| Total size in memory | 211.7 KiB |
| Average record size in memory | 96.1 B |
Variable types
| Categorical | 3 |
|---|---|
| Numeric | 9 |
| Dataset has 755 (33.5%) duplicate rows | Duplicates |
Cement_O.P.C_(Kgperm3) is highly correlated with Type_of_course_Aggregate and 4 other fields | High correlation |
WaterCement_Ratio is highly correlated with Type_of_course_Aggregate and 2 other fields | High correlation |
Water_Content_(Kgperm3) is highly correlated with WaterCement_Ratio | High correlation |
Total_Aggregate_(Kgperm3) is highly correlated with Type_of_Fine_Aggregate_ and 1 other fields | High correlation |
Fine_Aggregate_(Kgperm3) is highly correlated with Max._Size_of_Coarse_Aggregate_(mm) and 2 other fields | High correlation |
Coarse_Aggregate_(Kgperm3) is highly correlated with Fine_Aggregate_(Kgperm3) | High correlation |
Hardened_Concrete_Desnity_(avg.) is highly correlated with Total_Aggregate_(Kgperm3) | High correlation |
Type_of_course_Aggregate is highly correlated with Type_of_Fine_Aggregate_ and 2 other fields | High correlation |
Type_of_Fine_Aggregate_ is highly correlated with Type_of_course_Aggregate and 2 other fields | High correlation |
Max._Size_of_Coarse_Aggregate_(mm) is highly correlated with Fine_Aggregate_(Kgperm3) | High correlation |
Hardened_Concrete_Desnity_(avg.) is highly skewed (γ1 = 45.50589334) | Skewed |
Reproduction
| Analysis started | 2022-11-08 17:58:31.476845 |
|---|---|
| Analysis finished | 2022-11-08 17:59:19.441627 |
| Duration | 47.96 seconds |
| Software version | pandas-profiling v3.4.0 |
| Download configuration | config.json |
| Distinct | 3 |
|---|---|
| Distinct (%) | 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 17.8 KiB |
| 0 | |
|---|---|
| Natural | |
| Crushed |
Length
| Max length | 7 |
|---|---|
| Median length | 1 |
| Mean length | 3.573327426 |
| Min length | 1 |
Characters and Unicode
| Total characters | 8065 |
|---|---|
| Distinct characters | 12 |
| Distinct categories | 3 ? |
| Distinct scripts | 2 ? |
| Distinct blocks | 1 ? |
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | Crushed |
|---|---|
| 2nd row | Crushed |
| 3rd row | Crushed |
| 4th row | Natural |
| 5th row | Natural |
Common Values
| Value | Count | Frequency (%) |
| 0 | 1289 | |
| Natural | 598 | |
| Crushed | 370 | 16.4% |
Length
Histogram of lengths of the category
Category Frequency Plot
| Value | Count | Frequency (%) |
| 0 | 1289 | |
| natural | 598 | |
| crushed | 370 | 16.4% |
Most occurring characters
| Value | Count | Frequency (%) |
| 0 | 1289 | |
| a | 1196 | |
| u | 968 | |
| r | 968 | |
| N | 598 | |
| t | 598 | |
| l | 598 | |
| C | 370 | 4.6% |
| s | 370 | 4.6% |
| h | 370 | 4.6% |
| Other values (2) | 740 |
Most occurring categories
| Value | Count | Frequency (%) |
| Lowercase Letter | 5808 | |
| Decimal Number | 1289 | 16.0% |
| Uppercase Letter | 968 | 12.0% |
Most frequent character per category
Lowercase Letter
| Value | Count | Frequency (%) |
| a | 1196 | |
| u | 968 | |
| r | 968 | |
| t | 598 | |
| l | 598 | |
| s | 370 | 6.4% |
| h | 370 | 6.4% |
| e | 370 | 6.4% |
| d | 370 | 6.4% |
Uppercase Letter
| Value | Count | Frequency (%) |
| N | 598 | |
| C | 370 |
Decimal Number
| Value | Count | Frequency (%) |
| 0 | 1289 |
Most occurring scripts
| Value | Count | Frequency (%) |
| Latin | 6776 | |
| Common | 1289 | 16.0% |
Most frequent character per script
Latin
| Value | Count | Frequency (%) |
| a | 1196 | |
| u | 968 | |
| r | 968 | |
| N | 598 | |
| t | 598 | |
| l | 598 | |
| C | 370 | 5.5% |
| s | 370 | 5.5% |
| h | 370 | 5.5% |
| e | 370 | 5.5% |
Common
| Value | Count | Frequency (%) |
| 0 | 1289 |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 8065 |
Most frequent character per block
ASCII
| Value | Count | Frequency (%) |
| 0 | 1289 | |
| a | 1196 | |
| u | 968 | |
| r | 968 | |
| N | 598 | |
| t | 598 | |
| l | 598 | |
| C | 370 | 4.6% |
| s | 370 | 4.6% |
| h | 370 | 4.6% |
| Other values (2) | 740 |
| Distinct | 4 |
|---|---|
| Distinct (%) | 0.2% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 17.8 KiB |
| 0 | |
|---|---|
| Natural | |
| Crushed | 12 |
| 40 | 1 |
Length
| Max length | 7 |
|---|---|
| Median length | 1 |
| Mean length | 3.571112096 |
| Min length | 1 |
Characters and Unicode
| Total characters | 8060 |
|---|---|
| Distinct characters | 13 |
| Distinct categories | 3 ? |
| Distinct scripts | 2 ? |
| Distinct blocks | 1 ? |
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.
Unique
| Unique | 1 ? |
|---|---|
| Unique (%) | < 0.1% |
Sample
| 1st row | Natural |
|---|---|
| 2nd row | Natural |
| 3rd row | Natural |
| 4th row | Natural |
| 5th row | Natural |
Common Values
| Value | Count | Frequency (%) |
| 0 | 1289 | |
| Natural | 955 | |
| Crushed | 12 | 0.5% |
| 40 | 1 | < 0.1% |
Length
Histogram of lengths of the category
Category Frequency Plot
| Value | Count | Frequency (%) |
| 0 | 1289 | |
| natural | 955 | |
| crushed | 12 | 0.5% |
| 40 | 1 | < 0.1% |
Most occurring characters
| Value | Count | Frequency (%) |
| a | 1910 | |
| 0 | 1290 | |
| u | 967 | |
| r | 967 | |
| N | 955 | |
| t | 955 | |
| l | 955 | |
| C | 12 | 0.1% |
| s | 12 | 0.1% |
| h | 12 | 0.1% |
| Other values (3) | 25 | 0.3% |
Most occurring categories
| Value | Count | Frequency (%) |
| Lowercase Letter | 5802 | |
| Decimal Number | 1291 | 16.0% |
| Uppercase Letter | 967 | 12.0% |
Most frequent character per category
Lowercase Letter
| Value | Count | Frequency (%) |
| a | 1910 | |
| u | 967 | |
| r | 967 | |
| t | 955 | |
| l | 955 | |
| s | 12 | 0.2% |
| h | 12 | 0.2% |
| e | 12 | 0.2% |
| d | 12 | 0.2% |
Decimal Number
| Value | Count | Frequency (%) |
| 0 | 1290 | |
| 4 | 1 | 0.1% |
Uppercase Letter
| Value | Count | Frequency (%) |
| N | 955 | |
| C | 12 | 1.2% |
Most occurring scripts
| Value | Count | Frequency (%) |
| Latin | 6769 | |
| Common | 1291 | 16.0% |
Most frequent character per script
Latin
| Value | Count | Frequency (%) |
| a | 1910 | |
| u | 967 | |
| r | 967 | |
| N | 955 | |
| t | 955 | |
| l | 955 | |
| C | 12 | 0.2% |
| s | 12 | 0.2% |
| h | 12 | 0.2% |
| e | 12 | 0.2% |
Common
| Value | Count | Frequency (%) |
| 0 | 1290 | |
| 4 | 1 | 0.1% |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 8060 |
Most frequent character per block
ASCII
| Value | Count | Frequency (%) |
| a | 1910 | |
| 0 | 1290 | |
| u | 967 | |
| r | 967 | |
| N | 955 | |
| t | 955 | |
| l | 955 | |
| C | 12 | 0.1% |
| s | 12 | 0.1% |
| h | 12 | 0.1% |
| Other values (3) | 25 | 0.3% |
| Distinct | 2 |
|---|---|
| Distinct (%) | 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 17.8 KiB |
| 20 | |
|---|---|
| 40 |
Length
| Max length | 2 |
|---|---|
| Median length | 2 |
| Mean length | 2 |
| Min length | 2 |
Characters and Unicode
| Total characters | 4514 |
|---|---|
| Distinct characters | 3 |
| Distinct categories | 1 ? |
| Distinct scripts | 1 ? |
| Distinct blocks | 1 ? |
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | 40 |
|---|---|
| 2nd row | 40 |
| 3rd row | 20 |
| 4th row | 20 |
| 5th row | 20 |
Common Values
| Value | Count | Frequency (%) |
| 20 | 1547 | |
| 40 | 710 |
Length
Histogram of lengths of the category
Category Frequency Plot
| Value | Count | Frequency (%) |
| 20 | 1547 | |
| 40 | 710 |
Most occurring characters
| Value | Count | Frequency (%) |
| 0 | 2257 | |
| 2 | 1547 | |
| 4 | 710 | 15.7% |
Most occurring categories
| Value | Count | Frequency (%) |
| Decimal Number | 4514 |
Most frequent character per category
Decimal Number
| Value | Count | Frequency (%) |
| 0 | 2257 | |
| 2 | 1547 | |
| 4 | 710 | 15.7% |
Most occurring scripts
| Value | Count | Frequency (%) |
| Common | 4514 |
Most frequent character per script
Common
| Value | Count | Frequency (%) |
| 0 | 2257 | |
| 2 | 1547 | |
| 4 | 710 | 15.7% |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 4514 |
Most frequent character per block
ASCII
| Value | Count | Frequency (%) |
| 0 | 2257 | |
| 2 | 1547 | |
| 4 | 710 | 15.7% |
| Distinct | 23 |
|---|---|
| Distinct (%) | 1.0% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 362.7204253 |
| Minimum | 220 |
|---|---|
| Maximum | 450 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 17.8 KiB |
Quantile statistics
| Minimum | 220 |
|---|---|
| 5-th percentile | 330 |
| Q1 | 350 |
| median | 365 |
| Q3 | 375 |
| 95-th percentile | 400 |
| Maximum | 450 |
| Range | 230 |
| Interquartile range (IQR) | 25 |
Descriptive statistics
| Standard deviation | 20.26800255 |
|---|---|
| Coefficient of variation (CV) | 0.0558777536 |
| Kurtosis | 4.293691674 |
| Mean | 362.7204253 |
| Median Absolute Deviation (MAD) | 15 |
| Skewness | -0.4009837665 |
| Sum | 818660 |
| Variance | 410.7919275 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=23)
| Value | Count | Frequency (%) |
| 350 | 727 | |
| 375 | 621 | |
| 365 | 213 | 9.4% |
| 400 | 173 | 7.7% |
| 360 | 126 | 5.6% |
| 340 | 87 | 3.9% |
| 370 | 59 | 2.6% |
| 325 | 51 | 2.3% |
| 380 | 39 | 1.7% |
| 385 | 34 | 1.5% |
| Other values (13) | 127 | 5.6% |
| Value | Count | Frequency (%) |
| 220 | 2 | 0.1% |
| 250 | 4 | 0.2% |
| 300 | 13 | 0.6% |
| 310 | 9 | 0.4% |
| 315 | 3 | 0.1% |
| 320 | 10 | 0.4% |
| 325 | 51 | |
| 330 | 25 | 1.1% |
| 335 | 26 | 1.2% |
| 340 | 87 |
| Value | Count | Frequency (%) |
| 450 | 6 | 0.3% |
| 410 | 15 | 0.7% |
| 400 | 173 | 7.7% |
| 390 | 4 | 0.2% |
| 385 | 34 | 1.5% |
| 380 | 39 | 1.7% |
| 375 | 621 | |
| 370 | 59 | 2.6% |
| 365 | 213 | 9.4% |
| 360 | 126 | 5.6% |
| Distinct | 21 |
|---|---|
| Distinct (%) | 0.9% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 0.4893176783 |
| Minimum | 0.33 |
|---|---|
| Maximum | 0.62 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 17.8 KiB |
Quantile statistics
| Minimum | 0.33 |
|---|---|
| 5-th percentile | 0.44 |
| Q1 | 0.47 |
| median | 0.49 |
| Q3 | 0.51 |
| 95-th percentile | 0.54 |
| Maximum | 0.62 |
| Range | 0.29 |
| Interquartile range (IQR) | 0.04 |
Descriptive statistics
| Standard deviation | 0.03223190861 |
|---|---|
| Coefficient of variation (CV) | 0.06587113044 |
| Kurtosis | 1.742037642 |
| Mean | 0.4893176783 |
| Median Absolute Deviation (MAD) | 0.02 |
| Skewness | -0.1263135102 |
| Sum | 1104.39 |
| Variance | 0.001038895933 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=21)
| Value | Count | Frequency (%) |
| 0.51 | 358 | |
| 0.5 | 307 | |
| 0.48 | 290 | |
| 0.49 | 240 | |
| 0.47 | 195 | |
| 0.45 | 189 | |
| 0.46 | 151 | |
| 0.54 | 110 | 4.9% |
| 0.44 | 108 | 4.8% |
| 0.53 | 101 | 4.5% |
| Other values (11) | 208 |
| Value | Count | Frequency (%) |
| 0.33 | 6 | 0.3% |
| 0.4 | 11 | 0.5% |
| 0.41 | 9 | 0.4% |
| 0.42 | 15 | 0.7% |
| 0.43 | 15 | 0.7% |
| 0.44 | 108 | 4.8% |
| 0.45 | 189 | |
| 0.46 | 151 | |
| 0.47 | 195 | |
| 0.48 | 290 |
| Value | Count | Frequency (%) |
| 0.62 | 3 | 0.1% |
| 0.6 | 12 | 0.5% |
| 0.57 | 3 | 0.1% |
| 0.56 | 22 | 1.0% |
| 0.55 | 25 | 1.1% |
| 0.54 | 110 | 4.9% |
| 0.53 | 101 | 4.5% |
| 0.52 | 87 | 3.9% |
| 0.51 | 358 | |
| 0.5 | 307 |
| Distinct | 27 |
|---|---|
| Distinct (%) | 1.2% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 177.8529021 |
| Minimum | 100 |
|---|---|
| Maximum | 290 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 17.8 KiB |
Quantile statistics
| Minimum | 100 |
|---|---|
| 5-th percentile | 160 |
| Q1 | 170 |
| median | 180 |
| Q3 | 185 |
| 95-th percentile | 195 |
| Maximum | 290 |
| Range | 190 |
| Interquartile range (IQR) | 15 |
Descriptive statistics
| Standard deviation | 13.36793764 |
|---|---|
| Coefficient of variation (CV) | 0.07516288735 |
| Kurtosis | 12.46891977 |
| Mean | 177.8529021 |
| Median Absolute Deviation (MAD) | 10 |
| Skewness | 1.184488228 |
| Sum | 401414 |
| Variance | 178.7017569 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=27)
| Value | Count | Frequency (%) |
| 180 | 431 | |
| 190 | 367 | |
| 185 | 296 | |
| 165 | 281 | |
| 175 | 256 | |
| 170 | 236 | |
| 160 | 184 | |
| 168 | 36 | 1.6% |
| 195 | 28 | 1.2% |
| 210 | 22 | 1.0% |
| Other values (17) | 120 | 5.3% |
| Value | Count | Frequency (%) |
| 100 | 3 | 0.1% |
| 107 | 2 | 0.1% |
| 145 | 6 | 0.3% |
| 150 | 6 | 0.3% |
| 153 | 8 | 0.4% |
| 155 | 7 | 0.3% |
| 157 | 3 | 0.1% |
| 160 | 184 | |
| 162 | 2 | 0.1% |
| 165 | 281 |
| Value | Count | Frequency (%) |
| 290 | 5 | 0.2% |
| 225 | 6 | 0.3% |
| 215 | 20 | 0.9% |
| 210 | 22 | 1.0% |
| 205 | 19 | 0.8% |
| 200 | 15 | 0.7% |
| 195 | 28 | 1.2% |
| 190 | 367 | |
| 185 | 296 | |
| 184 | 2 | 0.1% |
| Distinct | 46 |
|---|---|
| Distinct (%) | 2.0% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 1875.70226 |
| Minimum | 1185 |
|---|---|
| Maximum | 1980 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 17.8 KiB |
Quantile statistics
| Minimum | 1185 |
|---|---|
| 5-th percentile | 1825 |
| Q1 | 1860 |
| median | 1880 |
| Q3 | 1895 |
| 95-th percentile | 1916 |
| Maximum | 1980 |
| Range | 795 |
| Interquartile range (IQR) | 35 |
Descriptive statistics
| Standard deviation | 42.580066 |
|---|---|
| Coefficient of variation (CV) | 0.02270086619 |
| Kurtosis | 113.5267728 |
| Mean | 1875.70226 |
| Median Absolute Deviation (MAD) | 20 |
| Skewness | -7.867644278 |
| Sum | 4233460 |
| Variance | 1813.062021 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=46)
| Value | Count | Frequency (%) |
| 1890 | 202 | 8.9% |
| 1875 | 171 | 7.6% |
| 1870 | 165 | 7.3% |
| 1885 | 155 | 6.9% |
| 1855 | 143 | 6.3% |
| 1880 | 132 | 5.8% |
| 1910 | 122 | 5.4% |
| 1860 | 121 | 5.4% |
| 1900 | 112 | 5.0% |
| 1915 | 112 | 5.0% |
| Other values (36) | 822 |
| Value | Count | Frequency (%) |
| 1185 | 2 | 0.1% |
| 1305 | 3 | 0.1% |
| 1340 | 1 | < 0.1% |
| 1775 | 4 | 0.2% |
| 1785 | 8 | 0.4% |
| 1795 | 2 | 0.1% |
| 1800 | 7 | 0.3% |
| 1805 | 16 | |
| 1810 | 20 | |
| 1815 | 13 |
| Value | Count | Frequency (%) |
| 1980 | 12 | 0.5% |
| 1970 | 4 | 0.2% |
| 1965 | 3 | 0.1% |
| 1940 | 13 | 0.6% |
| 1935 | 3 | 0.1% |
| 1930 | 14 | 0.6% |
| 1925 | 15 | 0.7% |
| 1923 | 3 | 0.1% |
| 1920 | 46 | |
| 1915 | 112 |
| Distinct | 79 |
|---|---|
| Distinct (%) | 3.5% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 715.7727071 |
| Minimum | 275 |
|---|---|
| Maximum | 865 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 17.8 KiB |
Quantile statistics
| Minimum | 275 |
|---|---|
| 5-th percentile | 610 |
| Q1 | 680 |
| median | 715 |
| Q3 | 760 |
| 95-th percentile | 815 |
| Maximum | 865 |
| Range | 590 |
| Interquartile range (IQR) | 80 |
Descriptive statistics
| Standard deviation | 63.79362782 |
|---|---|
| Coefficient of variation (CV) | 0.08912553829 |
| Kurtosis | 2.80833666 |
| Mean | 715.7727071 |
| Median Absolute Deviation (MAD) | 40 |
| Skewness | -0.6932806605 |
| Sum | 1615499 |
| Variance | 4069.62695 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 760 | 130 | 5.8% |
| 715 | 123 | 5.4% |
| 700 | 122 | 5.4% |
| 675 | 114 | 5.1% |
| 710 | 109 | 4.8% |
| 725 | 96 | 4.3% |
| 695 | 94 | 4.2% |
| 750 | 94 | 4.2% |
| 690 | 71 | 3.1% |
| 770 | 67 | 3.0% |
| Other values (69) | 1237 |
| Value | Count | Frequency (%) |
| 275 | 1 | < 0.1% |
| 360 | 4 | |
| 477 | 2 | 0.1% |
| 490 | 2 | 0.1% |
| 505 | 2 | 0.1% |
| 510 | 2 | 0.1% |
| 525 | 3 | 0.1% |
| 530 | 8 | |
| 540 | 6 | |
| 550 | 3 | 0.1% |
| Value | Count | Frequency (%) |
| 865 | 14 | |
| 860 | 8 | 0.4% |
| 855 | 3 | 0.1% |
| 850 | 6 | 0.3% |
| 842 | 3 | 0.1% |
| 840 | 15 | |
| 835 | 12 | |
| 830 | 21 | |
| 825 | 6 | 0.3% |
| 820 | 18 |
| Distinct | 101 |
|---|---|
| Distinct (%) | 4.5% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 1154.162605 |
| Minimum | 475 |
|---|---|
| Maximum | 1510 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 17.8 KiB |
Quantile statistics
| Minimum | 475 |
|---|---|
| 5-th percentile | 1055 |
| Q1 | 1128 |
| median | 1160 |
| Q3 | 1195 |
| 95-th percentile | 1250 |
| Maximum | 1510 |
| Range | 1035 |
| Interquartile range (IQR) | 67 |
Descriptive statistics
| Standard deviation | 77.72022965 |
|---|---|
| Coefficient of variation (CV) | 0.0673390641 |
| Kurtosis | 16.42015097 |
| Mean | 1154.162605 |
| Median Absolute Deviation (MAD) | 35 |
| Skewness | -2.807374046 |
| Sum | 2604945 |
| Variance | 6040.434097 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 1200 | 140 | 6.2% |
| 1160 | 120 | 5.3% |
| 1170 | 116 | 5.1% |
| 1150 | 111 | 4.9% |
| 1180 | 98 | 4.3% |
| 1190 | 80 | 3.5% |
| 1185 | 79 | 3.5% |
| 1145 | 79 | 3.5% |
| 1205 | 75 | 3.3% |
| 1140 | 74 | 3.3% |
| Other values (91) | 1285 |
| Value | Count | Frequency (%) |
| 475 | 1 | < 0.1% |
| 635 | 1 | < 0.1% |
| 642 | 2 | |
| 660 | 3 | |
| 670 | 1 | < 0.1% |
| 680 | 2 | |
| 685 | 2 | |
| 690 | 1 | < 0.1% |
| 696 | 2 | |
| 705 | 1 | < 0.1% |
| Value | Count | Frequency (%) |
| 1510 | 1 | < 0.1% |
| 1365 | 4 | |
| 1345 | 3 | 0.1% |
| 1335 | 4 | |
| 1330 | 2 | 0.1% |
| 1320 | 5 | |
| 1315 | 6 | |
| 1305 | 8 | |
| 1295 | 2 | 0.1% |
| 1290 | 3 | 0.1% |
Workability_Slump_(mm)
Real number (ℝ≥0)
| Distinct | 38 |
|---|---|
| Distinct (%) | 1.7% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 150.1626052 |
| Minimum | 0 |
|---|---|
| Maximum | 230 |
| Zeros | 5 |
| Zeros (%) | 0.2% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 17.8 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 90 |
| Q1 | 120 |
| median | 150 |
| Q3 | 180 |
| 95-th percentile | 205 |
| Maximum | 230 |
| Range | 230 |
| Interquartile range (IQR) | 60 |
Descriptive statistics
| Standard deviation | 38.62195915 |
|---|---|
| Coefficient of variation (CV) | 0.2572009129 |
| Kurtosis | -0.5094335418 |
| Mean | 150.1626052 |
| Median Absolute Deviation (MAD) | 30 |
| Skewness | -0.1613442356 |
| Sum | 338917 |
| Variance | 1491.655729 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=38)
| Value | Count | Frequency (%) |
| 150 | 291 | |
| 200 | 248 | 11.0% |
| 120 | 195 | 8.6% |
| 100 | 167 | 7.4% |
| 170 | 145 | 6.4% |
| 140 | 124 | 5.5% |
| 130 | 103 | 4.6% |
| 180 | 99 | 4.4% |
| 160 | 92 | 4.1% |
| 190 | 87 | 3.9% |
| Other values (28) | 706 |
| Value | Count | Frequency (%) |
| 0 | 5 | 0.2% |
| 60 | 1 | < 0.1% |
| 65 | 4 | 0.2% |
| 70 | 22 | 1.0% |
| 75 | 10 | 0.4% |
| 80 | 49 | 2.2% |
| 85 | 8 | 0.4% |
| 90 | 49 | 2.2% |
| 95 | 21 | 0.9% |
| 100 | 167 |
| Value | Count | Frequency (%) |
| 230 | 18 | 0.8% |
| 225 | 2 | 0.1% |
| 220 | 49 | 2.2% |
| 215 | 6 | 0.3% |
| 210 | 36 | 1.6% |
| 205 | 15 | 0.7% |
| 200 | 248 | |
| 195 | 28 | 1.2% |
| 190 | 87 | 3.9% |
| 185 | 39 | 1.7% |
| Distinct | 143 |
|---|---|
| Distinct (%) | 6.3% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 2429.733846 |
| Minimum | 202 |
|---|---|
| Maximum | 24003 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 17.8 KiB |
Quantile statistics
| Minimum | 202 |
|---|---|
| 5-th percentile | 2386 |
| Q1 | 2403 |
| median | 2407 |
| Q3 | 2452.67 |
| 95-th percentile | 2459 |
| Maximum | 24003 |
| Range | 23801 |
| Interquartile range (IQR) | 49.67 |
Descriptive statistics
| Standard deviation | 460.5104397 |
|---|---|
| Coefficient of variation (CV) | 0.1895312281 |
| Kurtosis | 2138.025559 |
| Mean | 2429.733846 |
| Median Absolute Deviation (MAD) | 18 |
| Skewness | 45.50589334 |
| Sum | 5483909.29 |
| Variance | 212069.8651 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 2405 | 208 | 9.2% |
| 2404 | 165 | 7.3% |
| 2455 | 135 | 6.0% |
| 2403 | 132 | 5.8% |
| 2406 | 102 | 4.5% |
| 2454 | 87 | 3.9% |
| 2453 | 84 | 3.7% |
| 2456 | 66 | 2.9% |
| 2407 | 65 | 2.9% |
| 2402 | 65 | 2.9% |
| Other values (133) | 1148 |
| Value | Count | Frequency (%) |
| 202 | 2 | 0.1% |
| 1404 | 1 | < 0.1% |
| 2133 | 3 | 0.1% |
| 2369 | 2 | 0.1% |
| 2382 | 11 | 0.5% |
| 2383 | 9 | 0.4% |
| 2384 | 24 | |
| 2385 | 36 | |
| 2385.5 | 1 | < 0.1% |
| 2386 | 29 |
| Value | Count | Frequency (%) |
| 24003 | 1 | < 0.1% |
| 2627 | 3 | |
| 2508 | 3 | |
| 2496 | 2 | 0.1% |
| 2493 | 1 | < 0.1% |
| 2486 | 1 | < 0.1% |
| 2484 | 1 | < 0.1% |
| 2483 | 3 | |
| 2481 | 6 | |
| 2480 | 3 |
7_day_str
Real number (ℝ≥0)
| Distinct | 209 |
|---|---|
| Distinct (%) | 9.3% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 30.98697386 |
| Minimum | 12 |
|---|---|
| Maximum | 45.9 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 17.8 KiB |
Quantile statistics
| Minimum | 12 |
|---|---|
| 5-th percentile | 21.9 |
| Q1 | 27.4 |
| median | 31.2 |
| Q3 | 34.7 |
| 95-th percentile | 39.2 |
| Maximum | 45.9 |
| Range | 33.9 |
| Interquartile range (IQR) | 7.3 |
Descriptive statistics
| Standard deviation | 5.298545274 |
|---|---|
| Coefficient of variation (CV) | 0.1709926661 |
| Kurtosis | -0.1648309964 |
| Mean | 30.98697386 |
| Median Absolute Deviation (MAD) | 3.7 |
| Skewness | -0.08321793553 |
| Sum | 69937.6 |
| Variance | 28.07458202 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 28.3 | 35 | 1.6% |
| 27.7 | 32 | 1.4% |
| 25.9 | 32 | 1.4% |
| 34.6 | 31 | 1.4% |
| 31.1 | 31 | 1.4% |
| 35.9 | 29 | 1.3% |
| 32.4 | 29 | 1.3% |
| 31.4 | 29 | 1.3% |
| 33.1 | 28 | 1.2% |
| 26.8 | 28 | 1.2% |
| Other values (199) | 1953 |
| Value | Count | Frequency (%) |
| 12 | 2 | 0.1% |
| 15.6 | 3 | |
| 15.7 | 1 | < 0.1% |
| 15.9 | 2 | 0.1% |
| 17.9 | 4 | |
| 18.2 | 4 | |
| 18.3 | 4 | |
| 18.7 | 2 | 0.1% |
| 19 | 7 | |
| 19.2 | 1 | < 0.1% |
| Value | Count | Frequency (%) |
| 45.9 | 9 | |
| 45.8 | 2 | 0.1% |
| 44.2 | 6 | |
| 42.6 | 6 | |
| 42.3 | 6 | |
| 42.1 | 6 | |
| 41.9 | 6 | |
| 41.8 | 4 | |
| 41.4 | 3 | 0.1% |
| 41.2 | 6 |
Auto
The auto setting is an easily interpretable pairwise column metric of the following mapping: vartype-vartype : method, categorical-categorical : Cramer's V, numerical-categorical : Cramer's V (using a discretized numerical column), numerical-numerical : Spearman's ρ. This configuration uses the best suitable for each pair of columns.Spearman's ρ
The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
Pearson's r
The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
Kendall's τ
Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
Cramér's V (φc)
Cramér's V is an association measure for nominal random variables. The coefficient ranges from 0 to 1, with 0 indicating independence and 1 indicating perfect association. The empirical estimators used for Cramér's V have been proved to be biased, even for large samples. We use a bias-corrected measure that has been proposed by Bergsma in 2013 that can be found here.Phik (φk)
Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here. A simple visualization of nullity by column.
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
First rows
| Type_of_course_Aggregate | Type_of_Fine_Aggregate_ | Max._Size_of_Coarse_Aggregate_(mm) | Cement_O.P.C_(Kgperm3) | WaterCement_Ratio | Water_Content_(Kgperm3) | Total_Aggregate_(Kgperm3) | Fine_Aggregate_(Kgperm3) | Coarse_Aggregate_(Kgperm3) | Workability_Slump_(mm) | Hardened_Concrete_Desnity_(avg.) | 7_day_str | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | Crushed | Natural | 40 | 365 | 0.52 | 225 | 1870 | 710 | 1160 | 160 | 2407.0 | 21.8 |
| 1 | Crushed | Natural | 40 | 365 | 0.52 | 225 | 1870 | 710 | 1160 | 160 | 2403.0 | 26.3 |
| 2 | Crushed | Natural | 20 | 350 | 0.53 | 185 | 1915 | 725 | 1190 | 120 | 2475.0 | 32.7 |
| 3 | Natural | Natural | 20 | 340 | 0.49 | 165 | 1895 | 835 | 1060 | 190 | 2412.0 | 30.3 |
| 4 | Natural | Natural | 20 | 325 | 0.51 | 165 | 1910 | 840 | 1070 | 170 | 2404.0 | 27.0 |
| 5 | Natural | Natural | 20 | 325 | 0.51 | 165 | 1910 | 840 | 1070 | 170 | 2405.0 | 27.0 |
| 6 | Crushed | Natural | 20 | 340 | 0.56 | 190 | 1895 | 795 | 1100 | 105 | 2425.0 | 31.8 |
| 7 | Crushed | Natural | 20 | 340 | 0.56 | 190 | 1895 | 795 | 1100 | 105 | 2427.0 | 31.8 |
| 8 | Crushed | Natural | 20 | 370 | 0.50 | 185 | 1895 | 795 | 1100 | 125 | 2449.0 | 37.8 |
| 9 | Crushed | Natural | 20 | 360 | 0.53 | 190 | 1900 | 780 | 1120 | 100 | 2454.0 | 28.2 |
Last rows
| Type_of_course_Aggregate | Type_of_Fine_Aggregate_ | Max._Size_of_Coarse_Aggregate_(mm) | Cement_O.P.C_(Kgperm3) | WaterCement_Ratio | Water_Content_(Kgperm3) | Total_Aggregate_(Kgperm3) | Fine_Aggregate_(Kgperm3) | Coarse_Aggregate_(Kgperm3) | Workability_Slump_(mm) | Hardened_Concrete_Desnity_(avg.) | 7_day_str | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 2247 | 0 | 0 | 20 | 375 | 0.51 | 190 | 1875 | 750 | 1125 | 130 | 2444.0 | 26.9 |
| 2248 | 0 | 0 | 20 | 375 | 0.51 | 190 | 1855 | 755 | 1130 | 100 | 2454.0 | 24.5 |
| 2249 | 0 | 0 | 20 | 385 | 0.48 | 215 | 1810 | 685 | 1125 | 70 | 2414.0 | 22.3 |
| 2250 | 0 | 0 | 20 | 375 | 0.52 | 195 | 1860 | 570 | 1200 | 160 | 2435.0 | 41.0 |
| 2251 | 0 | 0 | 20 | 365 | 0.47 | 170 | 1865 | 700 | 1165 | 190 | 2404.0 | 34.8 |
| 2252 | 0 | 0 | 20 | 375 | 0.48 | 180 | 1845 | 665 | 1180 | 150 | 2404.0 | 24.8 |
| 2253 | 0 | 0 | 20 | 375 | 0.45 | 170 | 1855 | 725 | 1130 | 175 | 2404.0 | 34.7 |
| 2254 | 0 | 0 | 20 | 365 | 0.51 | 185 | 1900 | 800 | 1200 | 135 | 2454.0 | 35.4 |
| 2255 | 0 | 0 | 20 | 385 | 0.47 | 180 | 1815 | 655 | 1160 | 95 | 2386.0 | 25.9 |
| 2256 | 0 | 0 | 20 | 340 | 0.47 | 160 | 1900 | 760 | 1140 | 195 | 2402.0 | 25.9 |
Most frequently occurring
| Type_of_course_Aggregate | Type_of_Fine_Aggregate_ | Max._Size_of_Coarse_Aggregate_(mm) | Cement_O.P.C_(Kgperm3) | WaterCement_Ratio | Water_Content_(Kgperm3) | Total_Aggregate_(Kgperm3) | Fine_Aggregate_(Kgperm3) | Coarse_Aggregate_(Kgperm3) | Workability_Slump_(mm) | Hardened_Concrete_Desnity_(avg.) | 7_day_str | # duplicates | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 184 | 0 | 0 | 20 | 375 | 0.49 | 185 | 1900 | 760 | 1140 | 175 | 2465.0 | 28.3 | 9 |
| 11 | 0 | 0 | 20 | 350 | 0.46 | 160 | 1890 | 755 | 1135 | 195 | 2403.0 | 34.1 | 6 |
| 16 | 0 | 0 | 20 | 350 | 0.47 | 165 | 1885 | 715 | 1170 | 180 | 2404.0 | 34.3 | 6 |
| 26 | 0 | 0 | 20 | 360 | 0.46 | 165 | 1865 | 785 | 1080 | 170 | 2394.0 | 40.0 | 6 |
| 30 | 0 | 0 | 20 | 360 | 0.46 | 170 | 1885 | 750 | 1135 | 150 | 2414.0 | 41.2 | 6 |
| 47 | 0 | 0 | 20 | 365 | 0.45 | 165 | 1870 | 730 | 1140 | 165 | 2405.0 | 35.4 | 6 |
| 54 | 0 | 0 | 20 | 365 | 0.47 | 170 | 1865 | 655 | 1210 | 160 | 2403.0 | 33.2 | 6 |
| 60 | 0 | 0 | 20 | 365 | 0.49 | 180 | 1895 | 780 | 1115 | 220 | 2442.0 | 35.9 | 6 |
| 63 | 0 | 0 | 20 | 365 | 0.51 | 185 | 1900 | 700 | 1200 | 160 | 2455.0 | 36.8 | 6 |
| 77 | 0 | 0 | 20 | 365 | 0.52 | 190 | 1885 | 735 | 1150 | 200 | 2445.1 | 37.9 | 6 |